Factors Affecting Web Page Similarity
نویسندگان
چکیده
Tools that allow effective information organisation, access and navigation are becoming increasingly important on the Web. Similarity between web pages is a concept that is central to such tools. In this paper, we examine the effect that content and layout-related aspects of web pages have on web page similarity. We consider the textual content contained within common HTML tags, the structural layout of pages, and the query terms contained within pages. Our study shows that combinations of factors can yield more promising results than individual factors, and that different aspects of web pages affect similarities between pages in a different manner. We found a number of factors that, when taken into account, can result in effective measures of similarity between web pages. Query information in particular, proved to be important for the effective organisation of web pages.
منابع مشابه
Different Strokes for Different Folks: An Analysis of Similarity and Diversity in Web Search
Relying purely on query-page similarity when ranking Web search results limits the scope of the result set to the detriment of search performance. In this paper we propose that introducing diversity into the ranking metric can increase topic coverage without adversely affecting result relevance in the face of vague queries.
متن کاملAn Efficient Approach for Near-duplicate page detection in web crawling
The drastic development of the World Wide Web in the recent times has made the concept of Web Crawling receive remarkable significance. The voluminous amounts of web documents swarming the web have posed huge challenges to the web search engines making their results less relevant to the users. The presence of duplicate and near duplicate web documents in abundance has created additional overhea...
متن کاملFactors Affecting Information Retrieval
Web is a largest information repository containing web pages, continuously expanding with time thus effectively and efficiently searching relevant information on the web is being a challenge. In recent years large numbers of search engines have been introduced which are still dealing with few problems of indexing and retrieval of relevant information. Therefore it becomes essential to evaluate ...
متن کاملتشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کاملHybrid Adaptive Educational Hypermedia Recommender Accommodating User’s Learning Style and Web Page Features
Personalized recommenders have proved to be of use as a solution to reduce the information overload problem. Especially in Adaptive Hypermedia System, a recommender is the main module that delivers suitable learning objects to learners. Recommenders suffer from the cold-start and the sparsity problems. Furthermore, obtaining learner’s preferences is cumbersome. Most studies have only focused...
متن کامل